Add thinking support for claude-3-7-sonnet #443

mikeyobrien · 2025-02-25T01:49:05Z

Add support for extended thinking for Claude 3.7 Sonnet

https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking

roryeckel · 2025-02-25T02:48:10Z

The chat is working so well for me. However if I enable reasoning mode, then the title and tag generation fails
Possibly relevant log: pipelines | Unexpected data structure: 'text'
pipelines | Full data: {'type': 'content_block_start', 'index': 0, 'content_block': {'type': 'thinking', 'thinking': '', 'signature': ''}}

roryeckel · 2025-02-25T03:16:19Z

Should reasoning be enabled for title & tag generation?
I think it should respect the enable_thinking, which it does! Just thinking out loud.

roryeckel · 2025-02-25T03:56:22Z

The issue is caused by content[0] in the synchronous call being the thinking block instead of the text block.

roryeckel · 2025-02-25T04:21:14Z

After I tried this:

    def get_completion(self, payload: dict) -> str:
        response = requests.post(self.url, headers=self.headers, json=payload)
        if response.status_code == 200:
            res = response.json()
            print(res)
            if "content" in res and (res_content := res["content"]):
                # Find the first content item that contains text
                for content_item in res_content:
                    if content_item.get("type") == "text" and "text" in content_item and (text := content_item["text"]):
                        return text
            return ""
        else:
            raise Exception(f"Error: {response.status_code} - {response.text}")

then tag generation worked. But title generation did not work, I didn't even see the request for it somehow....?

thiswillbeyourgithub · 2025-02-25T05:05:02Z

Does that pr allow setting the reasonning effort?

Also does it work if i'm using openrouter to call claude?

arty-hlr

I would recommend changing the valve to a boolean, you can see that it looks better in the UI when setting valves, it's a simple toggle instead of a string.

examples/pipelines/providers/anthropic_manifold_pipeline.py

arty-hlr · 2025-02-25T09:48:39Z

@thiswillbeyourgithub Yes, and no. There is a manifold to support reasoning on openrouter (https://github.com/rmarfil3/openwebui-openrouter-reasoning-tokens) but so far I haven't been able to make it work with Claude 3.7, it doesn't display the thinking.

mikeyobrien · 2025-02-25T15:31:44Z

@arty-hlr made the requested changes to change from string to bool.

Should reasoning be enabled for title & tag generation? I think it should respect the enable_thinking, which it does! Just thinking out loud.

@roryeckel you were on the right track, I made the changes to only enable reasoning when streaming and now both title and tag generation are working.

reasv · 2025-02-25T16:58:18Z

I think the amount of reasoning should not just be a "global" setting. You may want to change it on a per-query basis.

The "reasoning_effort" parameter should be used for this. If reasoning is globally enabled in the pipeline, this parameter determines how much reasoning to apply (should be possible to also set it to 0/None to disable reasoning in a specific chat, etc)

Since this parameter is a string, I would recommend supporting both integer values to specify the exact amount of reasoning tokens, as well as a few names for "breakpoints", for example:

{
    "none": None,  # Don't enable thinking
    "low": 1024,   # Minimum budget
    "medium": 4096,
    "high": 16384,
    "max": 32768,
}

roryeckel · 2025-02-25T17:01:07Z

I definitely agree with the reasoning effort part. Perhaps instead we define how many tokens are low, medium, and high in the pipeline valves. Then, use the native Open WebUI reasoning effort to map into the token budget.

reasv · 2025-02-25T17:28:07Z

Yeah that could work.
Would avoid hardcoding any values

mikeyobrien · 2025-02-25T18:29:01Z

I like this approach. Makes sense given the effort is logarithmic to the amount of budget tokens.

mikeyobrien · 2025-02-25T23:08:12Z

@reasv Made the changes to remove global configuration of thinking and moved to use the reasoning effort param.

roryeckel · 2025-02-25T23:29:38Z

In my local copy I refactored these latest changes like so:

class Pipeline:
    class Valves(BaseModel):
        ANTHROPIC_API_KEY: str = ""
        LOW_REASONING_EFFORT_TOKENS: int = 1024
        MEDIUM_REASONING_EFFORT_TOKENS: int = 4096
        HIGH_REASONING_EFFORT_TOKENS: int = 16384
        MAX_REASONING_EFFORT_TOKENS: int = 32768

    def __init__(self):
        self.type = "manifold"
        self.id = "anthropic"
        self.name = "anthropic/"

        self.valves = self.Valves(
            **{
                "ANTHROPIC_API_KEY": os.getenv(
                    "ANTHROPIC_API_KEY", "your-api-key-here"
                ),
            }
        )
        self.reasoning_effort_map = {
            'none': None,
            'low': self.valves.LOW_REASONING_EFFORT_TOKENS,
            'medium': self.valves.MEDIUM_REASONING_EFFORT_TOKENS,
            'high': self.valves.HIGH_REASONING_EFFORT_TOKENS,
            'max': self.valves.MAX_REASONING_EFFORT_TOKENS
        }

(Then replaced the other usages of REASONING_EFFORT_BUDGET_TOKEN_MAP with self.reasoning_effort_map - sorry I'm not a contributor to pipelines yet so I'm not sure how to propose these changes)

This makes it configurable in the UI:

However I noticed a bug with both of our changes. The UI reports "medium" reasoning level for DEFAULT. When using this setting, the reasoning effort is actually NONE. Would this bug be on the side of Open WebUI not passing "medium" when it's DEFAULT, or would the bug be on the pipeline side? Personally, I am leaning a bit towards "bug in Open WebUI" and not "bug in this pipeline". Because if we consider "NONE" to be "MEDIUM", then how would we be able to use NONE???? I will submit a PR upstream if you guys agree.

tjbck · 2025-02-26T00:06:35Z

Ready to be merged?

mikeyobrien · 2025-02-26T00:39:38Z

@roryeckel I think it's just a case of odd UX. From my understanding, the "medium" text is just a placeholder example of what could potentially be added there and is does not take effect until an actual setting is made.

It makes sense that the default is none given most models until recently had no concept of reasoning effort.

mikeyobrien · 2025-02-26T00:40:52Z

@tjbck I'm okay with merging this for now and leaving configurable effort to budget tokens in a follow up PR.

arty-hlr · 2025-02-26T17:24:17Z

I just tested it, for me only low and none work as reasoning_effort. medium, high, and max just hang, don't even display "thinking".

That aside, the "medium" as default value even though it's none is very confusing, I think people will just think it's broken, so I wouldn't merge in this state.

roryeckel · 2025-02-26T18:46:11Z

I just tested it, for me only low and none work as reasoning_effort. medium, high, and max just hang, don't even display "thinking".

That aside, the "medium" as default value even though it's none is very confusing, I think people will just think it's broken, so I wouldn't merge in this state.

My current opinion is that OpenWebUI's reasoning_effort was implemented without consideration for models that can reason, but DON'T by default. Remember, the reference implementation was O3 which HAS to reason, so the issues with "none" wouldn't have been caught

mikeyobrien · 2025-02-26T19:22:38Z

@arty-hlr the problem is likely because medium and high are greater than your max_tokens param. If max_tokens < budget_tokens the output stream hangs. Try upping it to be higher than the mapped budget tokens value.

I haven't looked in detail into a mechanism to surface the error messages to the user.

arty-hlr · 2025-02-26T20:57:38Z

@mikeyobrien I'll try that, but I'd say this is still not ready for merge as an example in this state.

arty-hlr · 2025-02-26T23:29:28Z

@mikeyobrien I'm confused, in the model settings or controls that parameter is 128 by default, which actually doesn't make sense. I always thought that those settings made 0 sense in open-webui tbh, what's set in the model advanced settings is not reflected in the controls during chat, and it's just such a mess...

I would be in favor of not setting this with reasoning effort and going back to a valve. As it is with the state of those parameters in open-webui this is not usable unfortunately... :(

roryeckel · 2025-02-27T00:05:58Z

that parameter is 128 by default

AdvancedParams.svelte:

I thought it was 2048? By the way, all these strings are hard-coded. So perhaps an acceptable fix for now would be to change 'medium' to straight up not have a default, or say the default is provider-specific

arty-hlr · 2025-02-27T00:23:54Z

@roryeckel This is context size, I believe mike was talking about max tokens:

Or is it so confusing that the names don't make sense? idk

roryeckel · 2025-02-27T00:28:06Z

Or is it so confusing that the names don't make sense? idk

Oops, that was my confusion. However still the same story about the hard-coding. About your question from earlier, I think advancedparams is a sort of "temporary override" for the defaults that come from the model. You can pretty much ignore the defaults that advanacedparams tells you because it is "truly" coming from the model

Btw, "low" effort is way more than 128 tokens so I don't think it's num_predict, I think it's Context Length that needs to be changed to fix the freezing issue though.

mikeyobrien · 2025-02-27T18:36:53Z

This is the issue I see in logs:

Exception: Error: 400 - {"type":"error","error":{"type":"invalid_request_error","message":"`max_tokens` must be greater than `thinking.budget_tokens`. Please consult our documentation at https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#max-tokens-and-context-window-size"}}

I'm opposed to reverting back to the global valve setting since there is no longer a mechanism to turn off thinking per-chat. Ideally the error is surfaced to the user.

reasv · 2025-02-27T20:46:15Z

I am also opposed to making the value global. It needs to be settable per-conversation at least.
But this issue really shouldn't be a big deal. The script can check what the max tokens and thinking tokens are, on its own, can it not?

This is not something where we must wait for an error. We can read the settings and see if there is a contradiction.

Either

Automatically and transparently add +thinking_tokens to max_tokens (so max tokens refers to the number of actual response tokens, as it normally does in any other scenario)
throw an error before sending the request if thinking < max tokens
reduce thinking tokens to fit within max_tokens

I am for solution #1. We can even let users turn it on/off with a valve.
It could also be conditional: if max_tokens is already big enough, don't add anything to it.

tjbck · 2025-02-27T21:01:07Z

Agreed, the default values in the tooltips should be removed entirely.

arty-hlr · 2025-02-27T22:08:28Z

Yeah I understand that it would be nice to be able to turn on/off or change how much thinking claude 3.7 is doing in the chat. I would agree with solution 1 as well.

mikeyobrien · 2025-02-27T23:44:31Z

Alright, made the following changes:

max_tokens becomes max_tokens + budget_tokens when reasoning is enabled
error messages are surfaced now to inform the user of any issues instead of the stream hanging.

arty-hlr · 2025-02-28T10:53:27Z

Looks good! :)

LionsAd · 2025-02-28T22:19:03Z

Nice!

LionsAd · 2025-02-28T22:23:34Z

examples/pipelines/providers/anthropic_manifold_pipeline.py

-                    print(f"Full data: {data}")
-        else:
-            raise Exception(f"Error: {response.status_code} - {response.text}")
+        """Used for title and tag generation"""


Wrong function? I think this should be on get_completion ...

LionsAd

Except for my one nit:

Works great! (I ported the code over to the function to test it)

So FWIW, "approving of these changes".

LionsAd · 2025-02-28T22:57:40Z

If anyone needs these changes for the pipe.

Code is:

https://gist.github.com/LionsAd/3133ea4a70ca7c7cb3800afbab287220#file-anthropic_manifold_pipe-with-thinking-3-7-py

cc @justinh-rahb

Diff is:

https://gist.github.com/LionsAd/3133ea4a70ca7c7cb3800afbab287220#file-anthropic_manifold_pipe-with-thinking-3-7-diff

LionsAd · 2025-02-28T23:05:52Z

examples/pipelines/providers/anthropic_manifold_pipeline.py

+                ):
+                    try:
+                        budget_tokens = int(reasoning_effort)
+                    except ValueError as e:


Suggested change

except ValueError as e:

except (ValueError, TypeError) as e:

Reason for that is that reasoning_effort can be of NoneType and then this fails with:

ERROR [open_webui.functions] Error: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

Could also do a check for reasoning_effort to not be NoneType. (No idea why body.get() in that case would not do the default value)

LionsAd · 2025-02-28T23:21:09Z

I also got that defaults are not applied properly.

If I set the model to default, reasoning_effort is None - I think because default means to use whatever the pipe considers as default.

But the default says "medium", so it's a little bit confusing.

I found the discussion above.

My take is we can distinguish between "None" (as in NoneType) and "none" as in the string none.

That all said: This really should be a button to "Think" or not "Think" in OpenWeb UI - like in Grok.

So it's kind of an interesting case.

LionsAd · 2025-02-28T23:29:23Z

I opened a discussion here to add a "Think" button:

open-webui/open-webui#11006

LionsAd · 2025-03-01T02:39:40Z

Now with async and logging (which might be helpful here as well as it makes it super easy to see what is going on):

https://gist.github.com/LionsAd/ed81504e2663dcf33a3d2efc2f9a31f4

shabie · 2025-03-02T21:10:11Z

None of this handles the claude with AWS Bedrock right..?

LionsAd · 2025-03-02T22:03:51Z

"None of this handles the claude with AWS Bedrock right..?"

In theory it could, because AWS Bedrock just speaks Claude API (for all models), but you need a special signing key via boto3 client to make the request to the URL. So it's always a new token.

That said, you can just setup the AWS Bedrock Access Gateway to get an Open AI compatible gateway.

Btw. the way they transferred "reasoning_effort" into budget in the gateway is by taking a percentage of the max tokens available for the request.

You can find the gateway here:

https://github.com/aws-samples/bedrock-access-gateway

If you run OpenWeb UI without docker you can run the gateway also in a python "venv":

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

then start it.

lentil32 · 2025-03-03T05:13:10Z

For AWS Bedrock users, please see my bare minimum implementation based on @mikeyobrien 's pipeline: https://github.com/lentil32/anthropic_manifold_pipeline_aws_bedrock/

krittaprot · 2025-03-03T10:25:12Z

@LionsAd I just updated the bedrock-access-gateway, after this pull request is merged, do you think I will be able to just use claude-3.7-sonnet thinking capability through openai api inside Open-WebUI? I really wish it will work without any additional pipeline or function...

LionsAd · 2025-03-03T17:15:16Z

@krittaprot Even though it's off-topic, right now I am not seeing the <think> tags in OpenWeb UI:

aws-samples/bedrock-access-gateway#117 fixes it.

tjbck · 2025-03-06T03:53:50Z

Thanks!

arty-hlr suggested changes Feb 25, 2025

View reviewed changes

arty-hlr mentioned this pull request Feb 25, 2025

Add claude-3-7-sonnet-20250219 Model #441

Merged

mikeyobrien force-pushed the main branch from 54f1a37 to 599726d Compare February 25, 2025 15:26

mikeyobrien force-pushed the main branch from 599726d to 59fda2a Compare February 25, 2025 23:06

Add thinking support for claude-3-7-sonnet

10242dc

mikeyobrien force-pushed the main branch from 59fda2a to 10242dc Compare February 27, 2025 23:42

LionsAd reviewed Feb 28, 2025

View reviewed changes

LionsAd approved these changes Feb 28, 2025

View reviewed changes

LionsAd reviewed Feb 28, 2025

View reviewed changes

hongbo-miao mentioned this pull request Mar 3, 2025

Show Claude 3.7 Sonnet reasoning in Open WebUI hongbo-miao/hongbomiao.com#24688

Open

tjbck merged commit f89ab37 into open-webui:main Mar 6, 2025

Add thinking support for claude-3-7-sonnet #443

Add thinking support for claude-3-7-sonnet #443

Conversation

mikeyobrien commented Feb 25, 2025

roryeckel commented Feb 25, 2025 • edited Loading

roryeckel commented Feb 25, 2025 • edited Loading

roryeckel commented Feb 25, 2025

roryeckel commented Feb 25, 2025 • edited Loading

thiswillbeyourgithub commented Feb 25, 2025

arty-hlr left a comment

Choose a reason for hiding this comment

arty-hlr commented Feb 25, 2025

mikeyobrien commented Feb 25, 2025

reasv commented Feb 25, 2025

roryeckel commented Feb 25, 2025

reasv commented Feb 25, 2025

mikeyobrien commented Feb 25, 2025

mikeyobrien commented Feb 25, 2025

roryeckel commented Feb 25, 2025 • edited Loading

tjbck commented Feb 26, 2025

mikeyobrien commented Feb 26, 2025

mikeyobrien commented Feb 26, 2025

arty-hlr commented Feb 26, 2025

roryeckel commented Feb 26, 2025 • edited Loading

mikeyobrien commented Feb 26, 2025 • edited Loading

arty-hlr commented Feb 26, 2025

arty-hlr commented Feb 26, 2025

roryeckel commented Feb 27, 2025 • edited Loading

arty-hlr commented Feb 27, 2025

roryeckel commented Feb 27, 2025 • edited Loading

mikeyobrien commented Feb 27, 2025

reasv commented Feb 27, 2025

tjbck commented Feb 27, 2025

arty-hlr commented Feb 27, 2025

mikeyobrien commented Feb 27, 2025

arty-hlr commented Feb 28, 2025

LionsAd commented Feb 28, 2025

LionsAd Feb 28, 2025

Choose a reason for hiding this comment

LionsAd left a comment

Choose a reason for hiding this comment

LionsAd commented Feb 28, 2025 • edited Loading

LionsAd Feb 28, 2025 • edited Loading

Choose a reason for hiding this comment

LionsAd Feb 28, 2025

Choose a reason for hiding this comment

LionsAd commented Feb 28, 2025

LionsAd commented Feb 28, 2025

LionsAd commented Mar 1, 2025

shabie commented Mar 2, 2025

LionsAd commented Mar 2, 2025

lentil32 commented Mar 3, 2025 • edited Loading

krittaprot commented Mar 3, 2025

LionsAd commented Mar 3, 2025 • edited Loading

tjbck commented Mar 6, 2025

roryeckel commented Feb 25, 2025 •

edited

Loading

roryeckel commented Feb 25, 2025 •

edited

Loading

roryeckel commented Feb 25, 2025 •

edited

Loading

roryeckel commented Feb 25, 2025 •

edited

Loading

roryeckel commented Feb 26, 2025 •

edited

Loading

mikeyobrien commented Feb 26, 2025 •

edited

Loading

roryeckel commented Feb 27, 2025 •

edited

Loading

roryeckel commented Feb 27, 2025 •

edited

Loading

LionsAd commented Feb 28, 2025 •

edited

Loading

LionsAd Feb 28, 2025 •

edited

Loading

lentil32 commented Mar 3, 2025 •

edited

Loading

LionsAd commented Mar 3, 2025 •

edited

Loading